Backpropogation Derivation

The notation, equations, and symbols were derived from https://theclevermachine.wordpress.com/2014/09/06/derivation-error-backpropagation-gradient-descent-for-neural-networks/. These equations were derived in our last session.

$z_j$: input to node $j$ for layer $l$

$g_j$: activation function for node $j$ in layer $l$ (applied to $z_j$)

$a_j = g_j(z_j)$: output/activation of node $j$ in layer $l$

$w_{i,j}$: weights connecting node $i$ in layer $(l-1)$ to node $j$ in layer $l$

$t_k$: target value for node $k$ in the output layer

Gradients for Output Layer Weights

$\delta_k: (a_k-t_k)g_k'(z_k)$

$\frac{\partial{E}}{\partial{w_{j,k}}}$: $\delta_ka_j$

This notation was really confusing so I had to break it down line by line. When I am looking at the output layer, then my $a_k$ is the output of the node of the output layer and $a_j$ is the output from my hidden layer. $t_k$ is my target variable and $g_k'(z_k) = a_k'$ the derivative of the output layer. The change variable is what I will use for gradient descent.

Gradients for Hidden Layer Weights

$\delta_j = g_j'(z_j)\sum_k^K \delta_kw_{j,k}$

The $j$ represents the hidden layer that the weight is originating from and $k$ represents the output layer, where the weight is pointing to. In the case of a 1 hidden layer model, $j$ is the hidden layer and $k$ is the output layer. You want to sum over all of the output nodes that this weight could point to because when you do backprop, all of these nodes are affected by a change in this weight. $\delta_k$ is the delta for the kth node.



In [22]:

    
class NN_backprop:
    def __init__(self, n_input, n_hidden,n_nodes):
        self.n_input = n_input + 1
        self.n_layers = n_hidden +2
        self.n_nodes = n_nodes
        
        self.a_layers = np.ones(self.n_layers, self.n_nodes)
        self.z_output = np.ones((self.n_layers, self.n_nodes))
        self.a_output = np.ones((self.n_layers, self.n_nodes))
        
        self.delta = np.ones((self.n_layers, self.n_nodes))
        
        self.w = np.random.rand((self.n_layers, self.n_nodes, self.n_nodes))
        
        # stores the gradients to update it
        self.c_ = np.random.rand((self.n_layers, self.n_nodes, self.n_nodes))
  
    def runNN (self, inputs):
        self.a_input = inputs
        self.layers[0] = inputs
        
        # basically doing sigmoid(w^1x+w^1_0) to find output for each hidden layer
        for h in range(1,self.layers):
            for z_ in range(self.n_nodes):
                self.z_output[h][z_] = np.sum(np.dot(self.a_output[h-1],self.w[h,z_,]))
                self.a_hidden[h][z_] = self.sigmoid(self.z_output[h][z_])  
                                              
        return self.a_output

    def sigmoid(x):
        return 1.0/(1.0 + np.exp(-x))
    
    def derivative_sigmoid(x):
        return self.sigmoid(x)*(1-self.sigmoid(x))
    
    def backPropagate (self, targets, eta):
    
    # https://theclevermachine.wordpress.com/2014/09/06/
    #derivation-error-backpropagation-gradient-descent-for-neural-networks/
    
        ### dE/dW_jk
        
        output_deltas = np.zeros(self.n_nodes)
        self.delta[-1] = (self.a_output[-1]-targets)*derivative_sigmoid(self.z_output[-1])
        self.c_[-1,:,:] = self.delta[-1]*self.a_output[-2]
        self.w[-1,:,:] -= eta*self.c_[-1,:,:]
                                         
        #output delta should be a k x 1 array and self.a_hidden is a 1 x n_hidden
        # so c_output should be a n_output x n_hidden or something along these lines
        
        # these hodl the gradient changes for the weights that go from hidden to ouput
        for h in xrange(self.n_layers,-1):
            for i in xrange(self.n_nodes):
                for j in xrange(self.n_nodes):
                    for k in xrange(self.n_nodes):
                        self.delta[h][j] += derivative_sigmoid(self.z_output[h][j])*self.delta[h][k]*self.w[h,j,k]

                        # these hold the gradient changes for the weights that go from input to hidden
                        self.c_input[h,i,j] = self.delta[h][j]*self.a_output[h][i]

                        #update the weights
                        self.w[h,i,j] -= eta*self.c_input[h,i,j]



In [25]:

    
pat = [
      [[0,0], [1]],
      [[0,1], [1]],
      [[1,0], [1]],
      [[1,1], [0]]
  ]
myNN = NN_backprop( 2, 2, 1)
inputs = pat[0]
targets = pat[1]
myNN.runNN(pat)
# myNN.backPropagate(targets, .1)



In [32]:

    
np.zeros((3,5,4))[-1,:,:]









    Out[32]:





array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])



In [30]:

    
[1,2,3,4,5][-2]









    Out[30]:





4



In [ ]: